Overview

Dataset Statistics

Number of Variables 20
Number of Rows 59591
Missing Cells 237020
Missing Cells (%) 19.9%
Duplicate Rows 8041
Duplicate Rows (%) 13.5%
Total Size in Memory 47.1 MB
Average Row Size in Memory 829.2 B
Variable Types
  • Categorical: 6
  • Numerical: 14

Dataset Insights

furniture has 42887 (71.97%) missing values Missing
floornumber has 58530 (98.22%) missing values Missing
direction has 44541 (74.74%) missing values Missing
bedroom has 754 (1.27%) missing values Missing
bathroom has 1089 (1.83%) missing values Missing
facade has 44249 (74.25%) missing values Missing
street_size has 44464 (74.62%) missing values Missing
price_per_square is skewed Skewed
floornumber is skewed Skewed
area is skewed Skewed
price is skewed Skewed
lat is skewed Skewed
lng is skewed Skewed
bedroom is skewed Skewed
bathroom is skewed Skewed
floors is skewed Skewed
facade is skewed Skewed
street_size is skewed Skewed
num_people is skewed Skewed
area(m2) is skewed Skewed
density(people/m2) is skewed Skewed
Dataset has 8041 (13.49%) duplicate rows Duplicates
address has a high cardinality: 13922 distinct values High Cardinality
  • 1
  • 2
  • 3

Variables

district

categorical

Approximate Distinct Count 30
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 6.6 MB

Length

Mean 8.6656
Standard Deviation 1.6808
Median 9
Minimum 5
Maximum 12

Sample

1st row Ba Vì
2nd row Ba Vì
3rd row Ba Vì
4th row Ba Vì
5th row Ba Vì

Letter

Count 330688
Lowercase Letter 222328
Space Separator 70902
Uppercase Letter 108360
Dash Punctuation 0
Decimal Number 0

price_per_square

numerical

Approximate Distinct Count 9616
Approximate Unique (%) 16.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 931.1 KB
Mean 103.6491
Minimum 0.01
Maximum 1140
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • price_per_square is skewed right (γ1 = 3.0925)

Quantile Statistics

Minimum 0.01
5-th Percentile 17.57
Q1 40.51
Median 88.37
Q3 124
95-th Percentile 268.675
Maximum 1140
Range 1139.99
IQR 83.49

Descriptive Statistics

Mean 103.6491
Standard Deviation 93.3681
Variance 8717.5964
Sum 6.1766×1006
Skewness 3.0925
Kurtosis 15.7424
Coefficient of Variation 0.9008
  • price_per_square is not normally distributed (p-value 4.657661942630837e-10)
  • price_per_square has 3493 outliers

address

categorical

Approximate Distinct Count 13922
Approximate Unique (%) 23.4%
Missing 0
Missing (%) 0.0%
Memory Size 16.1 MB
  • The largest value (Đường Đại lộ Thăng Long, Phường Tây Mỗ, Quận Nam Từ Liêm, Hà Nội) is over 1.84 times larger than the second largest value (Đường Quốc Lộ 5, Thị trấn Trâu Quỳ, Huyện Gia Lâm, Hà Nội)

Length

Mean 57.4129
Standard Deviation 10.7256
Median 57
Minimum 14
Maximum 163

Sample

1st row Xã Tản Lĩnh, Huyện...
2nd row Thôn Nghe Xã Vân H...
3rd row Xã Yên Bài, Ba Vì,...
4th row Xã Yên Bài, Ba Vì,...
5th row Xã Yên Bài, Ba Vì,...

Letter

Count 1755692
Lowercase Letter 1167319
Space Separator 645951
Uppercase Letter 588373
Dash Punctuation 734
Decimal Number 27159
  • address contains many words: 2721 words

furniture

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 42887
Missing (%) 72.0%
Memory Size 2.3 MB
  • The largest value (Nội thất đầy đủ) is over 1.71 times larger than the second largest value (Nội thất cao cấp)

Length

Mean 15.6413
Standard Deviation 0.882
Median 15
Minimum 12
Maximum 17

Sample

1st row Nội thất cao cấp
2nd row Nội thất đầy đủ
3rd row Bàn giao thô
4th row Nội thất cao cấp
5th row Nội thất đầy đủ

Letter

Count 133204
Lowercase Letter 116500
Space Separator 49890
Uppercase Letter 16704
Dash Punctuation 0
Decimal Number 0

floornumber

numerical

Approximate Distinct Count 45
Approximate Unique (%) 4.2%
Missing 58530
Missing (%) 98.2%
Infinite 0
Infinite (%) 0.0%
Memory Size 16.6 KB
Mean 13.1122
Minimum 1
Maximum 952
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • floornumber is skewed right (γ1 = 28.3292)

Quantile Statistics

Minimum 1
5-th Percentile 2
Q1 5
Median 10
Q3 18
95-th Percentile 30
Maximum 952
Range 951
IQR 13

Descriptive Statistics

Mean 13.1122
Standard Deviation 30.2178
Variance 913.1167
Sum 13912
Skewness 28.3292
Kurtosis 877.0718
Coefficient of Variation 2.3046
  • floornumber is not normally distributed (p-value 3.9186604150604204e-24)
  • floornumber has 17 outliers

area

numerical

Approximate Distinct Count 1268
Approximate Unique (%) 2.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 931.1 KB
Mean 63.0648
Minimum 1
Maximum 500
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • area is skewed right (γ1 = 3.336)

Quantile Statistics

Minimum 1
5-th Percentile 30
Q1 39
Median 51
Q3 73
95-th Percentile 133
Maximum 500
Range 499
IQR 34

Descriptive Statistics

Mean 63.0648
Standard Deviation 40.6975
Variance 1656.2844
Sum 3.7581e+06
Skewness 3.336
Kurtosis 17.5375
Coefficient of Variation 0.6453
  • area is not normally distributed (p-value 9.294200176355887e-13)
  • area has 3561 outliers

direction

categorical

Approximate Distinct Count 8
Approximate Unique (%) 0.1%
Missing 44541
Missing (%) 74.7%
Memory Size 1.5 MB

Length

Mean 6.3045
Standard Deviation 2.0407
Median 7
Minimum 3
Maximum 8

Sample

1st row Đông Bắc
2nd row Đông Nam
3rd row Đông Nam
4th row Nam
5th row Nam

Letter

Count 58552
Lowercase Letter 39939
Space Separator 10659
Uppercase Letter 18613
Dash Punctuation 0
Decimal Number 0

price

numerical

Approximate Distinct Count 2970
Approximate Unique (%) 5.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 931.1 KB
Mean 6.9781
Minimum 0.001
Maximum 282
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • price is skewed right (γ1 = 8.4315)

Quantile Statistics

Minimum 0.001
5-th Percentile 1
Q1 2.4
Median 3.8
Q3 6.6
95-th Percentile 24
Maximum 282
Range 281.999
IQR 4.2

Descriptive Statistics

Mean 6.9781
Standard Deviation 12.053
Variance 145.2757
Sum 415831.8907
Skewness 8.4315
Kurtosis 118.516
Coefficient of Variation 1.7273
  • price is not normally distributed (p-value 1.0040843334800169e-23)
  • price has 6918 outliers

lat

numerical

Approximate Distinct Count 4414
Approximate Unique (%) 7.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 931.1 KB
Mean 21.0131
Minimum 20.6307
Maximum 21.3165
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • lat is skewed right (γ1 = 0.4647)

Quantile Statistics

Minimum 20.6307
5-th Percentile 20.9582
Q1 20.9869
Median 21.0114
Q3 21.0369
95-th Percentile 21.0734
Maximum 21.3165
Range 0.6857
IQR 0.04999

Descriptive Statistics

Mean 21.0131
Standard Deviation 0.03914
Variance 0.001532
Sum 1.2522e+06
Skewness 0.4647
Kurtosis 5.0581
Coefficient of Variation 0.001863
  • lat is not normally distributed (p-value 1.9538360642099337e-07)
  • lat has 907 outliers

lng

numerical

Approximate Distinct Count 4593
Approximate Unique (%) 7.7%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 931.1 KB
Mean 105.814
Minimum 105.3503
Maximum 107.0768
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • lng is skewed left (γ1 = -0.3427)

Quantile Statistics

Minimum 105.3503
5-th Percentile 105.7406
Q1 105.7856
Median 105.8112
Q3 105.844
95-th Percentile 105.9075
Maximum 107.0768
Range 1.7266
IQR 0.0584

Descriptive Statistics

Mean 105.814
Standard Deviation 0.05655
Variance 0.003198
Sum 6.3056e+06
Skewness -0.3427
Kurtosis 23.6917
Coefficient of Variation 0.00053446
  • lng is not normally distributed (p-value 3.0579026806282553e-15)
  • lng has 2069 outliers

bedroom

numerical

Approximate Distinct Count 46
Approximate Unique (%) 0.1%
Missing 754
Missing (%) 1.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 919.3 KB
Mean 3.6087
Minimum 1
Maximum 100
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • bedroom is skewed right (γ1 = 7.9787)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 2
Median 3
Q3 4
95-th Percentile 6
Maximum 100
Range 99
IQR 2

Descriptive Statistics

Mean 3.6087
Standard Deviation 2.2601
Variance 5.1079
Sum 212326
Skewness 7.9787
Kurtosis 163.9514
Coefficient of Variation 0.6263
  • bedroom is not normally distributed (p-value 3.5274322735116334e-20)
  • bedroom has 1895 outliers

bathroom

numerical

Approximate Distinct Count 30
Approximate Unique (%) 0.1%
Missing 1089
Missing (%) 1.8%
Infinite 0
Infinite (%) 0.0%
Memory Size 914.1 KB
Mean 3.2225
Minimum 1
Maximum 42
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • bathroom is skewed right (γ1 = 3.6199)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 2
Median 3
Q3 4
95-th Percentile 5
Maximum 42
Range 41
IQR 2

Descriptive Statistics

Mean 3.2225
Standard Deviation 1.5296
Variance 2.3398
Sum 188523
Skewness 3.6199
Kurtosis 55.2181
Coefficient of Variation 0.4747
  • bathroom is not normally distributed (p-value 7.111042915645123e-16)
  • bathroom has 486 outliers

floors

numerical

Approximate Distinct Count 37
Approximate Unique (%) 0.1%
Missing 506
Missing (%) 0.8%
Infinite 0
Infinite (%) 0.0%
Memory Size 923.2 KB
Mean 3.6489
Minimum 1
Maximum 105
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • floors is skewed right (γ1 = 4.0565)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 1
Median 4
Q3 5
95-th Percentile 6
Maximum 105
Range 104
IQR 4

Descriptive Statistics

Mean 3.6489
Standard Deviation 2.0748
Variance 4.3048
Sum 215597
Skewness 4.0565
Kurtosis 134.1045
Coefficient of Variation 0.5686
  • floors is not normally distributed (p-value 6.958356692851997e-20)
  • floors has 72 outliers

house_type

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 7.6 MB
  • The largest value (Nhà trong hẻm) is over 1.92 times larger than the second largest value (Căn hộ chung cư)

Length

Mean 13.2154
Standard Deviation 2.2271
Median 13
Minimum 7
Maximum 26

Sample

1st row Bán đất
2nd row Bán đất
3rd row Bán đất
4th row Bán đất
5th row Bán đất

Letter

Count 493889
Lowercase Letter 431520
Space Separator 133384
Uppercase Letter 62369
Dash Punctuation 2778
Decimal Number 0
  • The top 2 categories (Nhà trong hẻm, Căn hộ chung cư) take over 50.0%

facade

numerical

Approximate Distinct Count 312
Approximate Unique (%) 2.0%
Missing 44249
Missing (%) 74.2%
Infinite 0
Infinite (%) 0.0%
Memory Size 239.7 KB
Mean 5.8113
Minimum 0
Maximum 478
Zeros 26
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • facade is skewed right (γ1 = 38.1067)

Quantile Statistics

Minimum 0
5-th Percentile 3.5
Q1 4
Median 4.6
Q3 6
95-th Percentile 11
Maximum 478
Range 478
IQR 2

Descriptive Statistics

Mean 5.8113
Standard Deviation 9.4876
Variance 90.0142
Sum 89157.62
Skewness 38.1067
Kurtosis 1750.8874
Coefficient of Variation 1.6326
  • facade is not normally distributed (p-value 5.421133029061788e-25)
  • facade has 1217 outliers

street_size

numerical

Approximate Distinct Count 115
Approximate Unique (%) 0.8%
Missing 44464
Missing (%) 74.6%
Infinite 0
Infinite (%) 0.0%
Memory Size 236.4 KB
Mean 9.0355
Minimum 0
Maximum 386
Zeros 42
Zeros (%) 0.1%
Negatives 0
Negatives (%) 0.0%
  • street_size is skewed right (γ1 = 6.4813)

Quantile Statistics

Minimum 0
5-th Percentile 2.2
Q1 3
Median 5
Q3 10
95-th Percentile 30
Maximum 386
Range 386
IQR 7

Descriptive Statistics

Mean 9.0355
Standard Deviation 10.615
Variance 112.6775
Sum 136680.57
Skewness 6.4813
Kurtosis 135.2011
Coefficient of Variation 1.1748
  • street_size is not normally distributed (p-value 1.2210417749012195e-22)
  • street_size has 1332 outliers

num_people

numerical

Approximate Distinct Count 30
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 931.1 KB
Mean 323942.5093
Minimum 135618
Maximum 506347
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • num_people is skewed right (γ1 = 0.4111)

Quantile Statistics

Minimum 135618
5-th Percentile 160495
Q1 275745
Median 303586
Q3 371606
95-th Percentile 506347
Maximum 506347
Range 370729
IQR 95861

Descriptive Statistics

Mean 323942.5093
Standard Deviation 85836.002
Variance 7.3678e+09
Sum 1.9304e+10
Skewness 0.4111
Kurtosis 0.2438
Coefficient of Variation 0.265
  • num_people is not normally distributed (p-value 2.942594726082149e-10)

area(m2)

numerical

Approximate Distinct Count 30
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 931.1 KB
Mean 36.9968
Minimum 5.3
Maximum 423
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • area(m2) is skewed right (γ1 = 3.4079)

Quantile Statistics

Minimum 5.3
5-th Percentile 9.1
Q1 10.3
Median 32.2
Q3 49.6
95-th Percentile 116.7
Maximum 423
Range 417.7
IQR 39.3

Descriptive Statistics

Mean 36.9968
Standard Deviation 36.5618
Variance 1336.7647
Sum 2.2047e+06
Skewness 3.4079
Kurtosis 19.9379
Coefficient of Variation 0.9882
  • area(m2) is not normally distributed (p-value 3.0850449259325217e-19)
  • area(m2) has 3113 outliers

density(people/m2)

numerical

Approximate Distinct Count 30
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 931.1 KB
Mean 16447.8807
Minimum 687
Maximum 37161
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • density(people/m2) is skewed right (γ1 = 0.5066)

Quantile Statistics

Minimum 687
5-th Percentile 2452
Q1 7529
Median 12564
Q3 25588
95-th Percentile 37161
Maximum 37161
Range 36474
IQR 18059

Descriptive Statistics

Mean 16447.8807
Standard Deviation 11623.7526
Variance 1.3511e+08
Sum 9.8015e+08
Skewness 0.5066
Kurtosis -1.275
Coefficient of Variation 0.7067
  • density(people/m2) is not normally distributed (p-value 1.6421663422411395e-10)

note

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6.0 MB
  • The largest value (quận) is over 9.32 times larger than the second largest value (huyện)

Length

Mean 4.1002
Standard Deviation 0.306
Median 4
Minimum 4
Maximum 6

Sample

1st row huyện
2nd row huyện
3rd row huyện
4th row huyện
5th row huyện

Letter

Count 184537
Lowercase Letter 184537
Space Separator 104
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (quận, huyện) take over 50.0%
  • The largest value (quận) is over 9.32 times larger than the second largest value (huyện)

Interactions

Correlations

Missing Values